Skip to content

Conversation

@tdayris
Copy link
Contributor

@tdayris tdayris commented Oct 21, 2025

This PR lets GATK meta-wrapper use pathvars syntax.

QC

While the contributions guidelines are more extensive, please particularly ensure that:

  • test.py was updated to call any added or updated example rules in a Snakefile
  • input: and output: file paths in the rules can be chosen arbitrarily
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:)
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to
  • the meta.yaml contains a link to the documentation of the respective tool or command under url:
  • conda environments use a minimal amount of channels and packages, in recommended ordering

Summary by CodeRabbit

  • New Features

    • Workflow paths are now centrally configurable via a new path template section, including defaults and per-sample placeholders for references, indexes, intervals, known variants, and input alignments.
    • Rules and outputs now use templated placeholders so results/logs follow the configured layout.
  • Tests

    • Test configs and expected paths updated to match the new configurable path scheme.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 21, 2025

📝 Walkthrough

Walkthrough

Parameterizes file paths in the GATK MuTect2 meta-wrapper by adding a pathvars section and replacing hard-coded inputs/outputs/logs in Snakemake rules with templated placeholders; tests and test config updated to use the new path layout.

Changes

Cohort / File(s) Summary
Metadata configuration
meta/bio/gatk_mutect2_calling/meta.yaml
Added a pathvars section with default and custom keys: per, genome_sequence, genome_sequence_index, genome_dict, bed_intervals, known_variants, known_variants_tbi, and bam_file (with comments).
Rule implementation (wrapper)
meta/bio/gatk_mutect2_calling/meta_wrapper.smk
Replaced hard-coded file paths across multiple rules with parameterized placeholders (e.g., <genome_sequence>, <genome_dict>, <genome_sequence_index>, <bed_intervals>, <known_variants>, <results>/.../<per>..., <logs>/...). Updated inputs, outputs, and logs for rules: create_dict, samtools_index, picard_replace_read_groups, sambamba_index_picard_bam, mutect2_call, gatk_get_pileup_summaries, gatk_calculate_contamination, gatk_learn_read_orientation_model, and filter_mutect_calls.
Test scaffolding
meta/bio/gatk_mutect2_calling/test/Snakefile, meta/bio/gatk_mutect2_calling/test/config.yaml
Snakefile: added configfile directive and adjusted meta_wrapper block form. config.yaml: added pathvars block with concrete paths/templates for genome files, intervals, known variants, per, and bam_file.
Test validation update
test_wrappers.py
Updated expected test path from variant/Sample1.filtered.vcf.gz.tbi to results/variant/Sample1.filtered.vcf.gz.tbi to match the new results-prefixed layout.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

Possibly related PRs

Suggested reviewers

  • johanneskoester
  • fgvieira

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "feat: GATK Meta-wrapper update to follow pathvars" clearly and specifically summarizes the main objective of the changeset. It follows conventional commit style, is concise and readable, and directly communicates that the GATK meta-wrapper is being updated to support pathvars syntax. The raw_summary confirms this is indeed the primary change across all modified files (meta.yaml, meta_wrapper.smk, and test files), making the title fully related to and descriptive of the main change.
Description Check ✅ Passed The PR description fully follows the required template structure and completeness criteria. It includes a clear description of the change ("This PR lets GATK meta-wrapper use pathvars syntax"), and all QC checklist items are marked as completed with checkmarks. The author confirms adherence to the contributing documentation and specifically confirms all six key contributing requirements: test.py updates, arbitrary file path handling, automatic argument inference, proper temporary file handling, meta.yaml documentation links, and minimal conda environments. No required sections or confirmations are missing or incomplete.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5876525 and ccc92ad.

📒 Files selected for processing (1)
  • meta/bio/gatk_mutect2_calling/meta.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3496
File: bio/mtnucratio/test/Snakefile:2-6
Timestamp: 2024-11-26T08:31:00.099Z
Learning: In test files for Snakemake wrappers, such as `bio/mtnucratio/test/Snakefile`, hard-coded input and output paths are acceptable as examples and do not need to use wildcards to make paths flexible.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: docs
  • GitHub Check: testing
  • GitHub Check: Summary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bcf3ab and 5876525.

⛔ Files ignored due to path filters (1)
  • meta/bio/gatk_mutect2_calling/test/reference/known.vcf.gz is excluded by !**/*.gz
📒 Files selected for processing (5)
  • meta/bio/gatk_mutect2_calling/meta.yaml (1 hunks)
  • meta/bio/gatk_mutect2_calling/meta_wrapper.smk (6 hunks)
  • meta/bio/gatk_mutect2_calling/test/Snakefile (1 hunks)
  • meta/bio/gatk_mutect2_calling/test/config.yaml (1 hunks)
  • test_wrappers.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • test_wrappers.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: testing
  • GitHub Check: docs
  • GitHub Check: Summary
🔇 Additional comments (4)
test_wrappers.py (1)

6562-6562: LGTM! Test path correctly updated to reflect new structure.

The path change from variant/Sample1.filtered.vcf.gz.tbi to results/variant/Sample1.filtered.vcf.gz.tbi correctly aligns with the new pathvars-based output structure defined in the meta-wrapper.

meta/bio/gatk_mutect2_calling/test/config.yaml (1)

1-9: LGTM! Configuration properly defines all required pathvars.

The pathvars configuration is complete and consistent with the documentation in meta.yaml. The per-sample templating using "{sample}" for the per pathvar and "mapped/{sample}.bam" for bam_file correctly enables flexible path resolution across the workflow.

meta/bio/gatk_mutect2_calling/test/Snakefile (1)

6-11: LGTM! Snakefile correctly adopts config-driven meta-wrapper pattern.

The addition of the configfile directive and the conversion to nested meta_wrapper block form are both correct. This enables the workflow to use the pathvars defined in config.yaml for flexible path resolution.

meta/bio/gatk_mutect2_calling/meta_wrapper.smk (1)

3-158: Verify hardcoded {sample} references in params are intentional.

The conversion to placeholders looks comprehensive and consistent. However, Lines 42 and 77 contain {sample} directly in the extra params strings:

  • Line 42: "--RGLB lib1 --RGPL illumina --RGPU {sample} --RGSM {sample}"
  • Line 77: " --tumor-sample {sample} "

This creates an implicit constraint that the per pathvar must be exactly "{sample}". If users configure per as "{sample_id}" or another pattern, these params would not expand correctly.

Consider either:

  1. Documenting this constraint in meta.yaml, or
  2. Making it more flexible by using the <per> placeholder in params where appropriate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant